903 research outputs found

    PROlocalizer: integrated web service for protein subcellular localization prediction

    Get PDF
    Subcellular localization is an important protein property, which is related to function, interactions and other features. As experimental determination of the localization can be tedious, especially for large numbers of proteins, a number of prediction tools have been developed. We developed the PROlocalizer service that integrates 11 individual methods to predict altogether 12 localizations for animal proteins. The method allows the submission of a number of proteins and mutations and generates a detailed informative document of the prediction and obtained results. PROlocalizer is available at http://bioinf.uta.fi/PROlocalizer/

    Convolutional LSTM Networks for Subcellular Localization of Proteins

    Get PDF
    Machine learning is widely used to analyze biological sequence data. Non-sequential models such as SVMs or feed-forward neural networks are often used although they have no natural way of handling sequences of varying length. Recurrent neural networks such as the long short term memory (LSTM) model on the other hand are designed to handle sequences. In this study we demonstrate that LSTM networks predict the subcellular location of proteins given only the protein sequence with high accuracy (0.902) outperforming current state of the art algorithms. We further improve the performance by introducing convolutional filters and experiment with an attention mechanism which lets the LSTM focus on specific parts of the protein. Lastly we introduce new visualizations of both the convolutional filters and the attention mechanisms and show how they can be used to extract biological relevant knowledge from the LSTM networks

    Decadal Scale Variability of Larsen Ice Shelf Melt Captured by Antarctic Peninsula Ice Core

    Get PDF
    In this study, we used the stable water isotope record (δ18O) from an ice core drilled in Palmer Land, southern Antarctic Peninsula (AP). Utilizing δ18O we identified two climate regimes during the satellite era. During the 1979–1998 positive interdecadal Pacific oscillation (IPO) phase, a low-pressure system north of the Weddell Sea drove southeasterly winds that are associated with an increase in warm air mass intrusion onto the Larsen shelves, which melted and a decreased sea ice concentration in the Weddell Sea/increase in the Bellingshausen Sea. This climate setting is associated with anomaly low δ18O values (compared with the latter IPO period). There is significantly more melt along the northern AP ice shelf margins and on the Larsen D and southern Larsen C during the 1979–1998 IPO positive phase. The IPO positive climatic setting was coincidental with the Larsen A ice shelf collapse. In contrast, during the IPO negative phase (1999–2011), northerly winds caused a reduction in sea ice in the Bellingshausen Sea/Drake Passage region. Moreover, a Southern Ocean north of the Weddell Sea high-pressure system caused low-latitude warm humid air over the tip and east of the AP, a setting that is associated with increased northern AP snowfall, a high δ18O anomaly, and less prone to Larsen ice shelf melt

    Ice Core Chronologies from the Antarctic Peninsula: The Palmer, Jurassic, and Rendezvous Age-Scales

    Get PDF
    In this study, we present the age scales for three Antarctic Peninsula (AP) ice cores: Palmer, Rendezvous, and Jurassic. The three cores are all intermediate-depth cores, in the 133–141 m depth range. Non-sea-salt sulfate ([nssSO42−]) and hydrogen peroxide (H2O2) display marked seasonal variability suitable for annual-layer counting. The Palmer ice core covers 390 years, 1621–2011 C.E., and is one of the oldest AP cores. Rendezvous and Jurassic are lower elevation high-snow accumulation sites and therefore cover shorter intervals, 1843–2011 C.E. and 1874–2011 C.E., respectively. The age scales show good agreement with known volcanic age horizons. The three chronologies’ start and end dates of volcanic events are compared to the volcanic events in the published WAIS Divide core. The age difference for the Palmer age scale is ±6 months, Rendezvous ±9 months, and Jurassic ±7 months. Our results demonstrate the advantage of dating several cores from the same region at the same time. Additional confidence can be gained in the age scales by evaluating and finding synchronicity of [nssSO42−] peaks amongst the sites.</jats:p

    Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition

    Get PDF
    Background: Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. Results: In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. Conclusion: Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on http://sunflower.kuicr.kyoto-u.ac.jp/~tamura/slpfa.html webcite

    A method to improve protein subcellular localization prediction by integrating various biological data sources

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein subcellular localization is crucial information to elucidate protein functions. Owing to the need for large-scale genome analysis, computational method for efficiently predicting protein subcellular localization is highly required. Although many previous works have been done for this task, the problem is still challenging due to several reasons: the number of subcellular locations in practice is large; distribution of protein in locations is imbalanced, that is the number of protein in each location remarkably different; and there are many proteins located in multiple locations. Thus it is necessary to explore new features and appropriate classification methods to improve the prediction performance.</p> <p>Results</p> <p>In this paper we propose a new predicting method which combines two key ideas: 1) Information of neighbour proteins in a probabilistic gene network is integrated to enrich the prediction features. 2) Fuzzy k-NN, a classification method based on fuzzy set theory is applied to predict protein locating in multiple sites. Experiment was conducted on a dataset consisting of 22 locations from Budding yeast proteins and significant improvement was observed.</p> <p>Conclusion</p> <p>Our results suggest that the neighbourhood information from functional gene networks is predictive to subcellular localization. The proposed method thus can be integrated and complementary to other available prediction methods.</p

    ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The expansion of raw protein sequence databases in the post genomic era and availability of fresh annotated sequences for major localizations particularly motivated us to introduce a new improved version of our previously forged eukaryotic subcellular localizations prediction method namely "ESLpred". Since, subcellular localization of a protein offers essential clues about its functioning, hence, availability of localization predictor would definitely aid and expedite the protein deciphering studies. However, robustness of a predictor is highly dependent on the superiority of dataset and extracted protein attributes; hence, it becomes imperative to improve the performance of presently available method using latest dataset and crucial input features.</p> <p>Results</p> <p>Here, we describe augmentation in the prediction performance obtained for our most popular ESLpred method using new crucial features as an input to Support Vector Machine (SVM). In addition, recently available, highly non-redundant dataset encompassing three kingdoms specific protein sequence sets; 1198 fungi sequences, 2597 from animal and 491 plant sequences were also included in the present study. First, using the evolutionary information in the form of profile composition along with whole and N-terminal sequence composition as an input feature vector of 440 dimensions, overall accuracies of 72.7, 75.8 and 74.5% were achieved respectively after five-fold cross-validation. Further, enhancement in performance was observed when similarity search based results were coupled with whole and N-terminal sequence composition along with profile composition by yielding overall accuracies of 75.9, 80.8, 76.6% respectively; best accuracies reported till date on the same datasets.</p> <p>Conclusion</p> <p>These results provide confidence about the reliability and accurate prediction of SVM modules generated in the present study using sequence and profile compositions along with similarity search based results. The presently developed modules are implemented as web server "ESLpred2" available at <url>http://www.imtech.res.in/raghava/eslpred2/</url>.</p

    Prediction of nuclear proteins using SVM and HMM models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The nucleus, a highly organized organelle, plays important role in cellular homeostasis. The nuclear proteins are crucial for chromosomal maintenance/segregation, gene expression, RNA processing/export, and many other processes. Several methods have been developed for predicting the nuclear proteins in the past. The aim of the present study is to develop a new method for predicting nuclear proteins with higher accuracy.</p> <p>Results</p> <p>All modules were trained and tested on a non-redundant dataset and evaluated using five-fold cross-validation technique. Firstly, Support Vector Machines (SVM) based modules have been developed using amino acid and dipeptide compositions and achieved a Mathews correlation coefficient (MCC) of 0.59 and 0.61 respectively. Secondly, we have developed SVM modules using split amino acid compositions (SAAC) and achieved the maximum MCC of 0.66. Thirdly, a hidden Markov model (HMM) based module/profile was developed for searching exclusively nuclear and non-nuclear domains in a protein. Finally, a hybrid module was developed by combining SVM module and HMM profile and achieved a MCC of 0.87 with an accuracy of 94.61%. This method performs better than the existing methods when evaluated on blind/independent datasets. Our method estimated 31.51%, 21.89%, 26.31%, 25.72% and 24.95% of the proteins as nuclear proteins in <it>Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster</it>, mouse and human proteomes respectively. Based on the above modules, we have developed a web server NpPred for predicting nuclear proteins <url>http://www.imtech.res.in/raghava/nppred/</url>.</p> <p>Conclusion</p> <p>This study describes a highly accurate method for predicting nuclear proteins. SVM module has been developed for the first time using SAAC for predicting nuclear proteins, where amino acid composition of N-terminus and the remaining protein were computed separately. In addition, our study is a first documentation where exclusively nuclear and non-nuclear domains have been identified and used for predicting nuclear proteins. The performance of the method improved further by combining both approaches together.</p

    Identification and Functional Characterization of Gene Components of Type VI Secretion System in Bacterial Genomes

    Get PDF
    A new secretion system, called the Type VI Secretion system (T6SS), was recently reported in Vibrio cholerae, Pseudomonas aeruginosa and Burkholderia mallei. A total of 18 genes have been identified to be belonging to this secretion system in V. cholerae. Here we attempt to identify presence of T6SS in other bacterial genomes. This includes identification of orthologous sequences, conserved motifs, domains, families, 3D folds, genomic islands containing T6SS components, phylogenetic profiles and protein-protein association of these components. Our analysis indicates presence of T6SS in 42 bacteria and its absence in most of their non-pathogenic species, suggesting the role of T6SS in imparting pathogenicity to an organism. Analysis of genomic regions containing T6SS components, phylogenetic profiles and protein-protein association of T6SS components indicate few additional genes which could be involved in this secretion system. Based on our studies, functional annotations were assigned to most of the components. Except one of the genes, we could group all the other genes of T6SS into those belonging to the puncturing device, and those located in the outer membrane, transmembrane and inner membrane. Based on our analysis, we have proposed a model of T6SS and have compared the same with the other bacterial secretion systems
    corecore